Data-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model
نویسندگان
چکیده
A data-driven method of fundamental frequency (F0) contour synthesis was developed for Japanese text-to-speech (TTS) conversion systems. In the method, synthesis is done using the F0 contour generation process model, and the model parameters for each accent phrase are estimated using statistical methods. Although it was already shown that the synthesized F0 contours sounded highly natural as those using heuristic rules arranged by experts, occasional low quality happened depending on sentences to be synthesized. In the current paper, information on sentence structure, automatically obtainable through the parsing process, is added to input parameters of the statistical methods to obtain a better estimation. The experimental results showed that the new parameter was effective for improving especially phrase component estimation. Furthermore, data-driven estimation of accent phrase boundaries for input text, a necessary step to realize TTS conversion, was also realized in a similar way. The rate of correct estimation reached 90 %.
منابع مشابه
Generating fundamental frequency contours for speech synthesis in yorùbá
We present methods for modelling and synthesising fundamental frequency (F0) contours suitable for application in textto-speech (TTS) synthesis of Yorùbá (an African tone language). These methods are discussed and compared with a baseline approach using the HMM-based speech synthesis system HTS. Evaluation is done by comparing ten-fold cross validation squared errors on a small corpus of four s...
متن کاملCorpus-based synthesis of fundamental frequency contours based on a generation process model
A mode-constrained corpus-based synthesis strategy was developed for fundamental frequency (F0) contours of Japanese sentences. In the training phase, the relationship between linguistic factors and the command values (amplitudes and locations) of F0 contour generation process model was learned for a prediction module; a neural network in the current paper. Input parameters consist of linguisti...
متن کاملA target approximation intonation model for yorùbá TTS
A complete intonation model based on quantitative target approximation is described for Yorùbá text-to-speech (TTS) synthesis. This model is evaluated analytically and perceptually and compared to a fundamental frequency (F0) model using the standard HTS implementation. Analytical results suggest that the proposed approach more efficiently models F0 contours given typical data constraints in un...
متن کاملUsing FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis
Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a mo...
متن کاملSmooth contour estimation in data-driven pitch modelling
Apple's next-generation text-to-speech (TTS) system in MacOS X uses a superpositional pitch model, comprising a relatively smooth underlying F0 contour and a separate contribution from the in uence of the phonetic segments. This paper focuses on the data-driven modelling of the underlying contour, based on electroglottographic signals obtained from a corpus of reiterant speech. F0 extraction fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002